10 Tangent

Our next root-finding method also starts from the calculus review. The derivative produces great linear approximations to functions, and it is easy to find the root of a line. The lines produced by the derivative are asymptotically the best, meaning they’re more accurate in small neighborhoods, so this estimate will work better if we start near a root, a heuristic we will soon make precise. This is typically called Newton’s method.

The method is as follows.

Let \(f\) be a differentiable function, and \(x_0\) an initial guess. We inductively construct a sequence \((x_n)_{n\in \mathbb N}\) by finding roots of best linear approximations.

The best linear approximation to \(f\) at \(x_n\) is the line \[L_{f,x_n}(x) = f(x_n) + f'(x_n) (x-x_n)\] whose root is \(x_{n+1}\). Solving the equation, we obtain \[x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}.\]

If ever \(f'(x_n) = 0\), we are not able to proceed, but this is generally rare. The derivative of a non-pathological function cannot have too many roots without the original beingconstant, and the sequence is fairly random, so it should usually dodge them.

Proposition 10.1 Let \(f\) be a continuously differentiable function. Suppose a Newton’s method sequence \((x_n)_{n\in \mathbb N}\) for \(f\) converges to \(x\) and \(f'(x)\neq 0\). Then \(f(x) = 0\).

Proof. Let \(x = \lim_{n\to \infty} x_n\), and simply take the limit of the Newton recursion:

\[\begin{align*} \lim_{n\to\infty} x_{n+1} &= \lim_{n\to\infty} x_n - \frac{f(x_n)}{f'(x_n)},\\ x &= x - \frac{f(x)}{f'(x)}. \end{align*}\]

Cleaning up the arithmetic reveals \(f(x) = 0\).

The method is quite easy to implement in code. Here, we assume that \(f\) can be symbolically differentiated, but one could easily write a variation which takes the derivative as separate input.

def newton(f,x,n):
    for i in range(0,n):
        x = x - f(x)/diff(f)(x)
    return x

It is worth noting that this implementation has some undesirable numerical features. If \(f'(x_n)\) is small, then small perturbations in its value will lead to large changes in \(f(x_n)/f'(x_n)\), and it would be better for the method to be less sensitive to small errors. Geometrically, when \(f'(x)\) is small, the BLA is at \(x_n\) nearly constant, and so its root will be far from \(x_n\); this of course, also causes issues for the convergence of the sequence, so poor numerical behavior might simply mean that Newton’s method is not a good choice for the function and guess.

For example, if \(x=r\) is a multiple root, then \(f'(r) = 0\) will make \(f'(x)\) small in any neighborhood of \(r\), which can cause the problem above.

10.1 Convergence and Error Analysis

With this in mind, we’d like a criterion which guarantees that this issue doesn’t occur, and that the sequence converges. It turns out that this small derivative issue is the only real issue.

Theorem 10.1 Let \(f\) be a twice differentiable function and \(r\) a simple root of \(f\), meaning \(f(r) = 0\) but \(f'(r) \neq 0\). Suppose \(f''\) is bounded by \(M\). Then for all \(x_0\) sufficiently close to \(r\), the Newton’s method sequence starting from \(x_0\) converges to \(r\).

In fact, there is a constant \(C\) such that the errors \(e_n = |r-x_n|\) are bounded \[e_{n+1} \leq C e_n^2.\]

One may take \(C = \frac 1 2 \frac M B\) where \(M\) is an upper bound for \(f''\) and \(B\) is a lower bound for \(f'\).

Proof. By Proposition 10.1, our main task is to verify the sequence converges.

Consider the BLA at \(x_n\), \[L_n(x) = f(x_n) + f'(x_n)(x-x_n).\]

By Theorem 1.10, the Lagrange remainder formula, we know that this error can be written as \[f(x) - L_n(x) = \frac{f^{(2)}(t_n)}{2!}(x-x_n)^{2}\] for some \(t_n\) between \(x\) and \(x_n\). So with our bound for \(f''\), we can bound

\[|f(x) - L_n(x)| \leq \frac{M}{2}(x-x_n)^2.\]

Evaluating this at \(x=r\), knowing \(f(r) = 0\), we obtain \[|f(x_n) + f'(x_n)(r-x_n)| \leq \frac{M}{2}(r-x_n)^2.\]

Meanwhile, \[x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)},\] and so \[e_{n+1} = e_n + \frac{f(x_n)}{f'(x_n)},\] hence, \[f'(x_n)e_{n+1} = f'(x_n)e_n + f(x_n)\] therefore the inequality just obtained yields \[|f'(x_n)| |e_{n+1}| \leq \frac{M}{2} e_n^2\]

This tells us that the errors are inductively bounded, \[e_{n+1} \leq \frac 12 \frac{M}{|f'(x_n)|} e_n^2.\]

The lower bound for \(f'\) lets us bound the coefficient \[\frac 12 \frac{M}{|f'(x_n)|} < \frac 12 \frac M B.\]

Given some \(0 < \delta < 1\), if we have \(e_n < \delta/C\) then we get \[e_{n+1} < \delta \frac\delta{C},\] so when \(\delta < 1\) the error decreases by at least a factor of \(\delta\) at each step, and hence \(e_n \to 0\), meaning \(x_n\to r\). This establishes convergence and produces the error bound \[e_n < C e_n^2.\]

One can squeeze out small improvements from the theorem in various ways. For example, we know that \(f'\) is continuous and \(f'(r)\neq 0\) so there is always a bound \(B\) on a small enough neighborhood. Once \(\delta\) is small enough, we saw that the next iterate was even closer, hence still in the neighborhood, so we can retain the bound \(C\) on the coefficient.

A typical case is for \(f\) to be twice continuously differentiable and have \(f''(r),f'(r) > 0\), so that \(f\) is to be strictly monotonic and convex in a sufficiently small neighborhood of \(r\). The proof is much simpler in this situation.

Theorem 10.2 Suppose \(f\) is twice continuously differentiable and has a root \(r\) such that and \(f''(r),f'(r) > 0\). Then for all \(x_0>r\) sufficiently close to \(r\), Newton’s method converges to \(r\).

In fact, if \(f'',f' > 0\) everywhere, then Newton’s method always converges.

Proof. (alternate)

Restricting to a small enough neighborhood, we may assume that \(f\) is strictly increasing and convex. It can only have one root in this interval.

Consider the iteration function \[N(x) = x - \frac{f(x)}{f'(x)}\] so that \(x_n = N^n(x_0)\). We will show that this sequence is monotonic decreasing and bounded below by \(r\) on any interval \((r,r+\epsilon)\).

If \(x > r\) then monotonicity of \(f\) ensures \(f(x) > f(r) = 0\), and we already know \(f'(x) > 0\), so that \(N(x)\) is \(x\) minus a positive number, hence less than \(x\).

Now, notice that \[N'(x) = \frac{f(x)f''(x)}{f'(x)^2}\] and the right hand side is positive by assumption, when \(x\geq r\), so \(N\) is an increasing function. That means if \(x \geq y\), then \(N(x) \geq N(y)\). In particular, \(x\geq r\) so \(N(x) \geq N(r) = r\).

This means that the sequence \(x_n\) converges if we started from \(x_0 > r\) in the neighborhood, and that its limit is also in the neighborhood. Since Proposition 10.1 tells us that its limit is a root, and the only root here is \(r\), that must be the limit.

Now consider \(x_0 < r\). In this case, \(f(x_0) < f(r) = 0\) and so \(N'\) is negative, hence order reversing. That means \(N(x_0) > N(r) = r\), and so after the first step \(x_1\) is on the correct side of \(r\) for the argument above. Because we know \(f'\) is bounded away from zero near \(r\), the function \(N\) is continuous in a small enough neighborhood of \(r\). Therefore, given \(\epsilon > 0\) there is a \(\delta\) such that \(|x_1 - r| = |N(x_0) - N(r)| < \epsilon\) if \(|x_0 - r| < \delta\). In particular, we may take \(\epsilon\) to be such that this puts \(x_1\) in the right-side neighborhood of \(r\) where convergence is assured.

Finally, if \(f',f''>0\) everywhere, then the entire real line is “sufficiently small”.

Finally, note that this argument would actually work for any fixed combination of signs of \(f'(r)\) and \(f''(r)\) as long as the two were nonzero, it’s just that the “good” side would change.

Corollary 10.1 Suppose \(f\) is twice differentiable, has a root, and \(f',f''\) never vanish. Then Newton’s method converges from any starting point.

Example 10.1 Suppose you want to calculate the cube root of a real number \(a\). This is a root of the function \(f(x) = x^3 - a\). Notice that this function satisfies Corollary 10.1 except possibly at \(x=0\). That means Newton’s method converges for any sequence not passing through \(x=0\), which will almost always be the case.

For example, running just five steps with \(a=2\) starting from a guess \(x_0 = 1\) results in \[\sqrt[3]2 \approx 1.2599210498948732...\] which is accurate in all 16 decimal places. Meanwhile, five steps of bisection method, using \(1\) and \(2\) as endpoints, results in \[\sqrt[3]2 \approx 1.28125,\] which is not even accurate in the second decimal place!

Since we know that \(1^3 = 1\) and \(1.5^3 = 3.375\), there’s a root between them by IVT. Let’s work out the error bound on the interval \((1,1.5)\). Differentiating,

\[\begin{align*} f'(x) = 3x^2,\\ f''(x) = 6x. \end{align*}\] So a lower bound for \(f'\) is 3 and an upper bound for \(f''\) is \(9\), which means we can take \(C\) to be \[\frac 1 2 \frac {9}{3} = 1.5.\]

Since we know there’s a root in \((1,1.5)\) and our guess is at \(1\), our initial error is at most \(0.5\). Thus: \[\begin{align*} e_0 &< 0.5 \\ e_1 &< Ce_0^2 \leq 0.375\\ e_2 &< Ce_1^2 \leq 0.211094\\ e_3 &< Ce_1^2 \leq 0.067419\\ e_4 &< Ce_1^2 \leq 0.006682\\ e_5 &< Ce_1^2 \leq 0.000067 \end{align*}\]

At every step, the number of accurate decimal places essentially doubles. The reason our actual error is even better than this predicts is that our bound for \(C\) is quite coarse – at each step, the \(C_n\) quantities considered in the theorem can decrease too. Also, our upper bound for \(f''\) is far from sharp, since it could be as low as \(6\sqrt[3]2 \approx 7.5\). The lower bounds for \(f'\) increase toward \(3\sqrt[3] 2 \approx 3.75\) as well.

These convergence facts are conceptually useful, and tell us that Newton’s method applied to a sufficiently good guess near a simple root of a reasonable function will always converge, and do so rapidly, but the proofs don’t provide much insight in how close the guess must be. Nor is the error bound entirely effective, unless we have enough information to bound \(f''\).

Compared to bisection method, the tradeoff comes down to a guarantee of slow convergence against quick but not guaranteed convergence.

We proved a convergence theorem for twice-differentiable functions with good bounds on those derivatives on an entire neighborhood. Another analysis, carried out by Stephen Smale in Newton’s method estimates from data at one point makes stronger assumptions about the function, that it be but only requires bounding derivatives at the initial guess.

In particular, for an analytic function (everywhere given by power series) Smale defines a function \(\alpha(x_0,f)\) such that Newton’s method is guaranteed to converge for \(f\) starting from \(x_0\) if \(\alpha(x_0,f) < 0.1307...\).

The constant is the smallest real root of \[f(x) = 4x^4 - 16x^3 + 20x^2 - 10x + 1.\]

Since Smale’s work, the constant has been improved to \[\dfrac{13 - 3\sqrt{17}}{4} = 0.15767...\] independently by Royden and Han-Wang.

For analytic functions, global behavior can be determine locally. The quantity \(\alpha\) captures some behavior by the derivatives at \(z\), which, roughly speaking, can be spread to good behavior everywhere else by analyticity. The actual proof is more difficult than this sketch suggests, but more or less self-contained (and a fun detour for a version of this class with analysis as a prerequisite).

In subsequent

10.2 Division-Free Variation

From Newton’s Method without Division by Blanchard and Chamberland.

We saw previously that Neumann series can let us carry out division using only addition and multiplication: if \(|1-a| < 1\) then \[\frac 1 a = \frac{1}{1-(1-a)} = \sum_{k=1}^\infty (1-a)^k,\] so we can estimate \(1/a\) with the partial sums. A recursive presentation of the partial sums, \[x_{n+1} = 1 + (1-a)x_n,\] can be evalated more quickly than the naive expansion. The error bound is

\[\left| \frac 1 a - x_n\right| \leq |1-a|^{n+1} \left|\frac 1 a\right|\]

so the improvement in the estimate is linear, like bisection method.

What if we tried Newton’s method instead? In this case, we want a root of \[f(x) = \frac 1 x - a,\] and the Newton iteration is \[x_{n+1} = x_n - (x_n - ax_n^2),\] which we could rewrite a \[x_{n+1} = x_n(2 - ax_n).\] It requires only one more arithmetic operations than the Neumann iteration, but we know that when it converges it will do so much, much faster (asymptotically, at least). In fact, we have a convergence guarantee for any initial guess between \(0\) and \(1/a\).

The scheme proposed by Blanchard and Chamberland uses this iteration to replace the division in Newton’s method with multiplication and addition:

\[\begin{align*} y_{n+1} &= y_n (2 - f'(x_n) y_n),\\ x_{n+1} &= x_n - y_{n+1} f(x_n). \end{align*}\]

The first equation approximates \(1/f'(x_n)\) using the iteration we just found for \(1/a\). It is straightforward to check that if the sequences \((x_n)\) and \((y_n)\) converge, respectively, to \(x\) and \(y\) and \(y\neq 0\) then \(x\) is a root of \(f\) and \(y = 1/f'(x)\).

Blanchard and Chamberland show that these sequences converge quadratically. Modifying their proof to use the Lagrange remainder, one can show that \[e_{n+1} \leq 3n \frac{1}{2}\frac{M}{B} e_n^2\] where, as before \(M\) is an upper bound for \(f''\) and \(B\) is a lower bound for \(f'\). The presence of the \(3n\) term will slow down the approximation slightly, but essentially disappears for large \(n\). Mainly, it will require that the initial guess be much better than in plain Newton’s method. This is witnessed in their Figure 2, where we see that the division-free method applied to \(f(x) = x^3 - x^2 - 1\) converges for far fewer starting guesses than the usual method.

10.3 Higher Dimension

Unlike bisection method, Newton’s method adapts very well to higher dimensional situations. The derivative still furnishes a best linear approximation for functions from \(\mathbb R^n\) to \(\mathbb R^n\), \[L_{f,a}(x) = f(a) + f'(a)(x-a),\] so we can solve \(L_{f,a}(x) = 0\) as long as \(f'(a)\) is nonsingular, \[x = a - f'(a)^{-1}f(a),\] so the same iteration \[x_{n+1} = x_n - f'(x_n)^{-1}f(x_n)\] can still be used. As before, it’s easy to see that if it converges, then it converges to a root. However, it’s much harder to analyze the error in higher dimensions because there is no mean value theorem.

Exercises

Exercise 10.1 Consider the following recursion: \[x_{n+1} = \frac{x_n^2 + 1}{2x_n}.\] Describe this recursion in terms of Newton’s method; find a function \(f\) such that this is the associated . If \(x_0 = 10\), determine whether \(\lim_{k\rightarrow \infty} x_k\) exists, and, if so, evaluate it.

Hint: you can find \(f\) by guessing, or by solving the implied differential equation, which ends up being separable and solvable with partial fractions (or \(u\)-substitution, but the choice of substitution amounts to guessing \(f\)).

Exercise 10.2 Consider sequences defined by a recurrence of the form \[x_{n+1} = \frac{x_n^2}{2x_n - 3}.\]

Interpret a sequence of the above form in terms of Newton’s method; find a function \(f\) such that the above is the NM recursion associated to \(f\).
If \(x_0 = 8\), use (a) to show that \[\lim_{n\rightarrow\infty} x_n\] converges, and evaluate it.

Exercise 10.3 Determine a polynomial \(f(x)\) of degree \(3\) such that Newton’s method applied to it with an initial guess of (x=2) fails to converge because the iterates alternates between \(2\) and another value.

Hint: there are a lot of free parameters – you can pick the other value (x) in the NM sequence, as well as (f(2)) and (f(x)), after which Newton’s method forces two constraints on (f’) – what is it, and why? In total, you will have a system of four linear equations in four variables to solve. A picture could also help.

Exercise 10.4 Generalize the previous exercise as follows. Let \(f(t) = at^3 + bt^2 + ct + t\) be a cubic polynomial and \(x\neq y\) two points.

Given two points \((x,f(x))\) and \((y,f(y))\), determine the NM constraints on \(f'(x)\) and \(f'(y)\) that ensure the NM sequence is \(x,y,x,y,...\).
Observe that you have \(4\) linear constraints on \(f\). Summarize them in matrix form.
Using a computer algebra system, calculate the determinant (as a polynomial in \(x\) and \(y\)). Then, show that it is nonzero.

Exercise 10.5 Using Newton’s method, determine an efficient and accurate method for estimating the unique positive (6)th root of a positive real number.

Exercise 10.6 Let \(f(x) = x^2\). Prove that although \(x=0\) is not a simple root, Newton’s method always converges to it from any starting point.

Exercise 10.7 Assume \(f\) is continuously twice differentiable, that \(f'' >C>0\), and that our sequence converges to some \(x\) which is a double root of \(f\), so both \(f(x)\) and \(f'(x)\) are zero. Establish an error bound of the form \[|e_n| \leq \Box |e_{n-1}|,\] where \(\Box\) is a constant.

Hint: use the mean value theorem to write \[\frac{f'(x_n) - f'(r)}{x_n-r} = f''(\zeta)\] which you can rearrange to gain control over the quantity \[\frac{e_n}{f'(x_n)},\] without using a lower bound for \(f'\), but at the cost of an \(e_n\) in the numerator (hence the decreased exponent). Make this change to take the place of \(C\) in the proof of the convergence theorem for simple roots.

Exercise 10.8 Assume you are in the situation of Exercise 10.7. For each of the following values of the constant, would you rather use Newton’s method or bisection? Justify your answer. 1. \(\Box=0.8\), 2. \(\Box=0.5\), 3. \(\Box=0.0001\).

Explain why the choice of Newton vs bisection might not be easy if the constant is something like \(\Box=0.45\). What factors might you consider to make your decision?

Exercise 10.9 Newton’s method can be applied to complex functions too. In Example 10.1, even for a real number \(a\), there are two complex cube roots. If \(x_0\) is real, will Newton’s method find those roots? If \(x_0\) is complex, will Newton’s method ever converge to the real root?

Exercise 10.10 Show that Newton’s method always diverges for \(f(x) = x^2 + 1\).

Exercise 10.11 If \(x=a\) is a double root of (f(x)), then it will be just a simple root of \(g(x) = \frac{f(x)}{x-a}\). Apply Newton’s method to \(g\) to obtain an iteration formula in terms of \(f\) alone.

Exercise 10.12 If \(x=a\) is a double root of \(f(x)\), then it will be just a simple root of \(g(x) = f(x) - (x-a)\). Apply Newton’s method to \(g\).

Exercise 10.13 If \(x=a\) is a double root of \(f(x)\), then it will ``sort of’’ be a simple root of \(g(x) = \sqrt{|f(x)|}\). Apply Newton’s method to \(g\). Why does this seem more promising than the previous two ways of fixing repeated roots? Does it have problems?

Explain the issues, especially with our convergence theorem and error analyses in mind. Describe a small adjustment which fixes this problem.

Generalize this approach to a root of multiplicity (k).

Exercise 10.14 Linear approximations work well, what about quadratic? Describe a version of Newton’s method which uses the best quadratic approximation. What kind of error formula do you expect?

Exercise 10.15 Look at the application of Newton’s method to \(\arctan\) in the appendix. It succeeds for some \(x\) and fails for others. Determine for which \(x\) values it succeeds and for which it fails. Hint: use the symmetry of the function and some geometry.

Programming Problems

Exercise 10.16 Implement a version of Newton’s method using tolerance: if \(f'(x_n)\) is too small or if \(f(x_n)\) is already almost a root, the function should exit early.

Exercise 10.17 Implement Newton’s method for multivariable functions.